An Algorithm for Estimating all MatchesBetween Two

نویسندگان

  • Mikhail J. ATALLAH
  • Mikhail J. Atallah
چکیده

We give a randomized algorithm for estimating the score vector of matches between a text string of length N and a pattern string of length M; this is the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position. The randomized algorithm takes deterministic time O((N=M)Conv(M)) where Conv(M) is the time for performing a convolution of two vectors of size M each. The algorithm nds an unbiased estimator of the scores, whose variance is particularly small for scores that are close to M, i.e., for approximate occurrences of the pattern in the text. No assumptions are made about the probabilistic characteristics of the input, or about the number of diierent symbols appearing in T or P (i.e., the alphabet size need not be much smaller than M). The solution extends to the weighted case and to higher dimensions. Un algorithme pour l'estimation des co ncidences entre deux cha^ nes R esum e : Nous donnons un algorithme randomis e pour l'estimation du vecteur score des co ncidences entre un texte de longueur N et un motif de longueur M ; ce vecteur est obtenu en faisant glisser le motif le long du texte et en comptant le nom-bre de co ncidences a chaque position. L'algorithme randomis e a une complexit e de O((N=M)Conv(M)) en temps d eterministe, o u Conv(M) est la complexit e en temps pour r ealiser une convolution entre deux vecteurs de taille M. L'algorithme calcule un estimateur des scores qui est non biais e et dont la variance est partic-uli erement petite pour des scores proches de M, i.e., pour les occurrences bien ap-proch ees du motif dans le texte. Aucune hypoth ese n'est faite sur les caract eristiques probabilistes de l'entr ee, ni sur le nombre de symboles dii erents apparaissant dans le texte et le motif (i.e., la taille de l'alphabet ne n ecessite pas d'^ etre plus petite que M). L'algorithme s' etend au cas pond er e et aux plus grandes dimensions. Abstract We give a randomized algorithm for estimating the score vector of matches between a text string of length N and a pattern string of length M ; this is the vector obtained when the pattern is slid along the text, and the number of matches is counted for each position. The randomized algorithm takes deterministic time O((N=M)Conv(M)) where Conv(M) …

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An EM Algorithm for Estimating the Parameters of the Generalized Exponential Distribution under Unified Hybrid Censored Data

The unified hybrid censoring is a mixture of generalized Type-I and Type-II hybrid censoring schemes. This article presents the statistical inferences on Generalized Exponential Distribution parameters when the data are obtained from the unified hybrid censoring scheme. It is observed that the maximum likelihood estimators can not be derived in closed form. The EM algorithm for computing the ma...

متن کامل

Estimating Land Surface Temperature in the Central Part of Isfahan Province Based on Landsat-8 Data Using Split- Window Algorithm

Land surface temperature (LST) is used as one of the key sources to study land surface processes such as evapotranspiration, development of indexes, air temperature modeling and climate change. Remote sensing data offer the possibility of estimating LST all over the world with high temporal and spatial resolution. Landsat-8, which has two thermal infrared channels, provides an opportunity for t...

متن کامل

A New Similarity Measure Based on Item Proximity and Closeness for Collaborative Filtering Recommendation

Recommender systems utilize information retrieval and machine learning techniques for filtering information and can predict whether a user would like an unseen item. User similarity measurement plays an important role in collaborative filtering based recommender systems. In order to improve accuracy of traditional user based collaborative filtering techniques under new user cold-start problem a...

متن کامل

An Improved Big Bang-Big Crunch Algorithm for Estimating Three-Phase Induction Motors Efficiency

Nowadays, the most generated electrical energy is consumed by three-phase induction motors. Thus, in order to carry out preventive measurements and maintenances and eventually employing high-efficiency motors, the efficiency evaluation of induction motors is vital. In this paper, a novel and efficient method based on Improved Big Bang-Big Crunch (I-BB-BC) Algorithm is presented for efficiency e...

متن کامل

Estimating the Parameters in Photovoltaic Modules: A Constrained Optimization Approach

This paper presents a novel identification technique for estimation of unknown parameters in photovoltaic (PV) systems. A single diode model is considered for the PV system, which consists of five unknown parameters. Using information of standard test condition (STC), three unknown parameters are written as functions of the other two parameters in a reduced model. An objective function and ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997